Particulate Matter (PM) Pollution refers to particules that are found in the air as pollutants. Some of these are emitted directly from a source, such as a fire or construction site, but most are formed in the atmosphere as a result of chemical reactions. These come in many shapes and sizes, and are categorized accordingly.
Small particulate matter can be inhaled and cause serious health problems. Some particles less than 10 micrometers in diameter can get deep into your lungs and some may even get into your bloodstream.
Fine particles are also the main cause of reduced visibility (haze) in parts of the United States, including many of our treasured national parks and wilderness areas.
Information about PM used here was obtained from the United States Environmental Protection Agency
In order to gain insights into the research question Are there any counties in the U.S. that exceed the national standard for fine particle pollution? this data set is used.
pm25 fips region longitude latitude
1 16.19452 6019 west -119.9035 36.63837
2 15.80378 6029 west -118.6833 35.29602
3 18.44073 6031 west -119.8113 36.15514
4 16.66180 6037 west -118.2342 34.08851
5 15.01573 6047 west -120.6741 37.24578
6 17.42905 6065 west -116.8036 33.78331
7 16.25190 6099 west -120.9588 37.61380
8 16.18358 6107 west -119.1661 36.23465
---
title: "Insights into PM2.5"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: sandstone
navbar-bg: "green"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(dplyr)
library(corrgram)
library(DT)
library(plotly)
pm25data<-read.csv("avgpm25.csv")
```
Overview
===
Column {data-width=550}
-----------------------------------------------------------------------
- Particulate Matter (PM) Pollution refers to particules that are found in the air as pollutants. Some of these are emitted directly from a source, such as a fire or construction site, but most are formed in the atmosphere as a result of chemical reactions. These come in many shapes and sizes, and are categorized accordingly.
- Small particulate matter can be inhaled and cause serious health problems. Some particles less than 10 micrometers in diameter can get deep into your lungs and some may even get into your bloodstream.
- Fine particles are also the main cause of reduced visibility (haze) in parts of the United States, including many of our treasured national parks and wilderness areas.
Column {.tabset data-width=450}
-----------------------------------------------------------------------
### Types of PM
- PM10
- inhalable particles
- diameters that are generally 10 micrometers and smaller
- PM2.5
- fine inhalable particles
- diameters that are generally 2.5 micrometers and smaller
- PM2.5 pose the greatest risk to health
### Source
Information about PM used here was obtained from the [United States Environmental Protection Agency](https://www.epa.gov/pm-pollution/particulate-matter-pm-basics#PM)
Data
===
Column {data-width=500}
---
### <b><font size = 4><span Style = "color:blue">Data Table</span></font></b>
```{r show_table}
datatable(pm25data[1:50,], rownames=FALSE)
```
Column {data-width=500}
---
### <span Style = "color:red">Data Info</span>
In order to gain insights into the research question <span Style = "color:blue">Are there any counties in the U.S. that exceed the national standard for fine particle pollution?</span> this data set is used.
- The data contains the following variables:
- <span Style = "color:green">pm25</span> is the average PM2.5 level.
- <span Style = "color:green">fips</span> is the five-digit code indicating the county.
- <span Style = "color:green">region</span> is the area of the country (east/west) the county is located in.
- <span Style = "color:green">longitude</span> is the longitude of the centroid for that county.
- <span Style = "color:green">latitude</span> is the latitude of the centroid for that county.
Boxplot
===
Column {data-width=500}
---
```{r box1}
ggplot(pm25data, aes(x=pm25)) + geom_boxplot(fill="lightgreen")+labs(title="Distribution of Average Fine Particle Pollution Levels", x= "PM2.5")
```
Column {data-width=500}
---
### Analysis
- The shape of this distribution looks to be pretty symmetrical due to the median being almost centered in the middle 50% of the data.
- There are also a lot of outliers in this box plot, which are depicted as the dots that are outside of the whiskers. There seems to be 4 outlier values below the lower fence, and 6 outlier values above the upper fence.
PM2.5 standard
===
Column {data-width=400}
---
```{r}
filter(pm25data, pm25>15)
```
Column {.tabset data-width=600}
---
### Background
- PM2.5 Standards are air quality standards specify a maximum amount of PM to be present in outdoor air. There are different standards for PM10 and PM2.5
- Limiting PM pollution in the air protects human health and the environment.
- Although the current national ambient air quality standard is 12 micrograms per cubic meter, it used to be 15.
### Analysis
- The counties that exceed PM2.5 of 15 are all located in western area of the United States, as depicted to us by the "region" variable.
- Additionally, the "fips" variables tell us 5 digit codes that correspond to counties in the United States. Upon analysis of the [fips](https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt#:~:text=FIPS%20codes%20are%20numbers%20which,to%20which%20the%20county%20belongs.) in this filtered data, all of these corresponding counties are located in California. This data indicates to us that there may be a correlation between the western US, specifically California, and more fine particle pollution.
Regional PM2.5
===
Column {data-width=500}
---
```{r box2}
ggplot(pm25data, aes(x=pm25, y=region))+geom_boxplot(fill="lightgreen")+labs(title="Distribution of Average Fine Particle Pollution Levels by Region", x="PM2.5", y="Region in US")
```
Column {data-width=500}
---
### Analysis
- From this graph, we learn that the eastern region of the US has a higher median PM2.5 score compared to the west, and that the western US has a larger spread of data, or variation, compared to the east.
- We also can see that the eastern US has 7 outliers below the lower fence of the boxplot and none above the upper fence, and conversely, the western US has 7 outliers above the upper fence of the boxplot, and none below the lower fence.
- The data for the eastern US looks to be normally distributed when outliers are not considered, and the western US seems to be more skewed right (even without the outliers considered)
Violin
===
Column {data-width=500}
---
```{r violin}
ggplot(pm25data, aes(x=pm25, y=region)) + geom_violin(fill="lightgreen") + labs(title= "Distribution of Averge Fine Particle Pollution Levels by Region", x= "PM2.5", y="Region in the US")
```
Column {data-width=500}
---
### Analysis
- The violin plot is a combination of a kernel density plot and a box plot. The more wider the graph is at a certain point corresponds to an increased density of data values.
- Based on this, we can tell that the highest density of data for the west is around 7 PM2.5, and for the east it is around 10-11 PM2.5
- The violin plots also show us that the western region of the US has a much larger spread of data than the east, which gives the west a skewed right shape. The east has outliers around the 5 PM2.5 area, which causes its distribution to be more skewed left.
Histogram
===
Column {data-width=500}
---
```{r hist1}
ggplot(pm25data, aes(x=pm25))+geom_histogram(fill="lightgreen")+geom_vline(xintercept=12)+geom_text(aes(x=12,y=40,label="PM2.5 Standard, 12"))+labs(title="Distribution of Average Fine Particle Pollution Levels",x="PM2.5")
```
Column {data-width=500}
---
### Analysis
- This graph compares the data with the PM2.5 standard. There are several observations that exceed the standard.
- This histogram shows us that this data has a large spread.
- It also looks to be bimodal, due to having two tall peaks in the center of the data separated by a dip.
- The center of the data is around 11 PM2.5.
- The shape of this data is skewed right due to the presence of outliers above the data.
Facet Histogram
===
Column {data-width=500}
---
```{r}
ggplot(pm25data, aes(x=pm25))+geom_histogram(fill="darkgreen")+facet_wrap(~region)+labs(title="Distribution of Average Fine Particle Pollution Levels by Region",x="PM2.5")
```
Column {data-width=350}
---
### Analysis
- The distribution of the PM2.5 data in the eastern US looks slightly skewed left, and the western US is skewed right.
- The eastern US distribution has a much taller shape, likely due to more sampling and having a higher count. There are also two very tall, distinct peaks in the center of the distribution of the center, giving the impression of a multimodal distribution.
- The western US distribution is much shorter, conversely due to less sampling taking place and having a smaller count. This distribution also looks pretty unimodal, with the highest peak being in the center of the data. There are lots of data points above the center, that look to be potential outliers, giving it the skewed right shape.
Scatterplot
===
Row {.tabset data-height=350}
---
### Analysis - Latitude and PM2.5 in the Eastern US
- From this graph, I can interpret that the variable Latitude and the PM2.5 variable have a stronger association when looking at the eastern region of the US, due to the scatterplot points being more tightly clustered in one region of the plot.
- It depicts almost a quadratic shape, where as the latitude is smaller the PM2.5 value is smaller, as the latitude increases the PM2.5 also increases up until approximately a latitude of 40, when the PM2.5 value starts decreasing.
- There are few points that stray from this centralized region of the data, allowing me to interpret a strong association between Latitude and PM2.5 in the eastern US.
### Analysis- Latitude and PM2.5 in the Western US
- On the other hand, the western region of the US shows a more weak association between these two variables, because the scatterplot points are more spread out and seem to show no pattern.
- There are also more points that are far separated from the central density of the data, giving the appearance of outliers.
- Overall, due to the lack of a pattern and the wider spread of data on the scatterplot, the western US seems to have a weak to no association between Latitiude and PM2.5.
Row {data-height=650}
---
### **Scatterplot**
```{r}
ggplot(pm25data, aes(x=pm25, y=latitude, color=region))+geom_point()+labs(title="Distribution of Average Fine Particle Pollution Levels by Latitude", x="PM2.5", y="Latitude")
```
### **Facet Scatterplot**
```{r}
ggplot(pm25data, aes(x=pm25, y=latitude))+geom_point(color="purple")+facet_wrap(~region) +labs(title="Distribution of Average Fine Particle Pollution Levels by Latitude", x="PM2.5", y="Latitude")
```
Correlogram
===
Column {data-width=500}
---
```{r}
cgram<-select(pm25data, pm25, latitude, longitude)
corrgram(cgram, lower.panel = panel.pts, upper.panel = panel.shade)
```
Column {data-wdth=500}
---
### Analysis
- This corrgram is essentially a correlation matrix for the variables PM2.5, Latitude, and Longitude. This tells us (due to the light pink color) that the variables PM2.5 and Latitude are very weakly negatively associated, where as Latitude increases, PM2.5 decreases or vice versa.
- Between PM2.5 and Longitude, the corrgram shows us that these variables have a weakly positive association( due to the light blue color), where as Longitude increases, PM2.5 increases and vice versa.
- Lastly, this corrgram also tells us about the relation between the Latitude and Longitude variables. The scatterplot the corrgram provides for these variables looks to have much less of a relationsip than the previous two sets of variables, and it also tells us that these variables are very weakly negatively associated. This tells us that as Longitude increase, Latitude decreases and vice versa.